library(tidyverse)
library(stringr)
library(ggplot2)
library(dygraphs)
library(xts)
library(plotly)

Download data

We are working with a dataset from a Portuguese bank. The data categorizes direct marketing efforts (phone calls) designed to sell term deposit products. The dataset was donated to UCI’s Machine Learning Repository. The goal is to predict which clients are most likely to subscribe to a term deposit. Data range from May 2008 to November 2010

download.file(url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip",
              destfile = "data/bank-additional.zip")
trying URL 'https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip'
Content type 'application/zip' length 444572 bytes (434 KB)
==================================================
downloaded 434 KB
unzip(zipfile = "data/bank-additional.zip", 
      exdir = "data", 
      files = "bank-additional/bank-additional-full.csv", 
      junkpaths = TRUE)

Read data

Read data and clean up column names. Remove unnecessary columns and records. Impute year and create a column for date.

raw_data <- read_delim(file = "data/bank-additional-full.csv", 
                       delim = ";", 
                       col_types = cols(nr.employed = col_number()),
                       progress = FALSE) 
Error in read_delim(file = "data/bank-additional-full.csv", delim = ";",  : 
  could not find function "read_delim"

Insights

There were far more contacts in 2008 and 2009.

# Monthly rollup
dat <- all_data %>%
  group_by(month_idx, year, month, date) %>%
  summarize(subscribe = sum(term_deposit == "yes"), 
            total = n(),
            percent = 100 * mean(term_deposit == "yes"))
# Percent
xts(dat$percent, dat$date) %>%
  dygraph(main = "Percent subscribing") %>%
  dyRangeSelector()

# Totals
with(dat, xts(cbind(Total = total, Subscribe = subscribe), date)) %>%
  dygraph(main = "Total contacts and subscriptions") %>%
  dyRangeSelector()

Term deposits are correlated to some extent with macro economic indicators.

p <- all_data %>%
  group_by(month_idx, year, month) %>%
  summarize(euribor3m = mean(euribor3m),
            cons_conf_idx = mean(cons_conf_idx),
            cons_price_idx = mean(cons_price_idx),
            term_deposit = 100 * mean(term_deposit == "yes")) %>%
  ungroup %>%
  mutate_at(vars(euribor3m, cons_conf_idx, cons_price_idx),
                 function(x) (x - min(x)) / (max(x) - min(x))) %>%
  gather("key", "value", euribor3m, cons_conf_idx, cons_price_idx)
ggplot(p, aes(month_idx, value, color = key)) +
  geom_line() +
  ggtitle("Economic Indicators") +
  xlab("Month") +
  ylab("Index")

ggplot(p, aes(term_deposit, value, color = year)) +
  geom_point() +
  facet_grid(~key) +
  ggtitle("Economic Indicators vs Term Deposits") +
  xlab("Term Depoist Percent") +
  ylab("Index")

Lower rates of euribor3m were associated with associated with more term desposits.

ggplot(all_data, aes(euribor3m, color = term_deposit)) +
  geom_density(adjust=0.05)

Clients with a successful prior outcome were much more likley to subscribe to a term deposit.

all_data %>%
  group_by(term_deposit) %>%
  count(prior_outcome) %>%
  mutate(pct = round(100 * n / sum(n), 1)) %>%
  select(-n) %>%
  spread(term_deposit, pct)

Contacts who subscribe to term deposits skew slightly older and younger.

qplot(term_deposit, age, data = all_data, geom = "boxplot")

October saw an unusually high percent of term deposits in 2008.

p <- all_data %>%
  group_by(year, month) %>%
  summarize(term_deposit = 100 * mean(term_deposit == "yes")) %>%
  ggplot(aes(month, term_deposit, fill = year)) +
  geom_bar(stat = "identity", position="dodge")
ggplotly(p)

Blue collar workers are less likely to subscribe.

all_data %>%
  group_by(term_deposit) %>%
  count(job) %>%
  mutate(pct = round(100 * n / sum(n), 1)) %>%
  select(-n) %>%
  spread(term_deposit, pct)

There were far more contacts in 2008 than in 2010

all_data %>%
  group_by(year, month) %>%
  summarize(term_deposit = sum(term_deposit == "yes"), count = n()) %>%
  ggplot(aes(count, term_deposit, color = year)) +
  geom_point() +
  xlab("Total contacts") +
  ylab("Term Deposit Subscriptions") +
  ggtitle("Contact volume")

Conclusions

There is a lot of variability in total contacts and in percentage of term deposit subscriptions over time. There were far more contacts made in 2008 when the three month Euribor rate was high. As total contacts dropped, the percentage of term deposit subsriptions increased.

Subscriptions are correlated with a variety of factors, including age and job to name a few. There is also a strong correlation with prior outcomes, as one might expect.

This analysis only analyzed a few select variables. The next goal should be to predict term deposit subscriptions as a function of all the variables in the data.

LS0tCnRpdGxlOiAiQmFuayBNYXJrZXRpbmcgRXhwbG9yYXRvcnkgQW5hbHlzaXMiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCmBgYHtyLCBtZXNzYWdlPUZBTFNFfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShzdHJpbmdyKQpsaWJyYXJ5KGdncGxvdDIpCmxpYnJhcnkoZHlncmFwaHMpCmxpYnJhcnkoeHRzKQpsaWJyYXJ5KHBsb3RseSkKYGBgCgojIyBEb3dubG9hZCBkYXRhCgpXZSBhcmUgd29ya2luZyB3aXRoIGEgZGF0YXNldCBmcm9tIGEgUG9ydHVndWVzZSBiYW5rLiAgVGhlIGRhdGEgY2F0ZWdvcml6ZXMgZGlyZWN0IG1hcmtldGluZyBlZmZvcnRzIChwaG9uZSBjYWxscykgZGVzaWduZWQgdG8gc2VsbCB0ZXJtIGRlcG9zaXQgcHJvZHVjdHMuICBUaGUgW2RhdGFzZXRdKGh0dHBzOi8vYXJjaGl2ZS5pY3MudWNpLmVkdS9tbC9kYXRhc2V0cy9CYW5rK01hcmtldGluZykgd2FzIGRvbmF0ZWQgdG8gVUNJJ3MgTWFjaGluZSBMZWFybmluZyBSZXBvc2l0b3J5LiBUaGUgZ29hbCBpcyB0byBwcmVkaWN0IHdoaWNoIGNsaWVudHMgYXJlIG1vc3QgbGlrZWx5IHRvIHN1YnNjcmliZSB0byBhIHRlcm0gZGVwb3NpdC4gRGF0YSByYW5nZSBmcm9tIE1heSAyMDA4IHRvIE5vdmVtYmVyIDIwMTAKCgpgYGB7ciwgbWVzc2FnZT1GQUxTRX0KZG93bmxvYWQuZmlsZSh1cmwgPSAiaHR0cHM6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL21hY2hpbmUtbGVhcm5pbmctZGF0YWJhc2VzLzAwMjIyL2JhbmstYWRkaXRpb25hbC56aXAiLAogICAgICAgICAgICAgIGRlc3RmaWxlID0gImRhdGEvYmFuay1hZGRpdGlvbmFsLnppcCIpCgp1bnppcCh6aXBmaWxlID0gImRhdGEvYmFuay1hZGRpdGlvbmFsLnppcCIsIAogICAgICBleGRpciA9ICJkYXRhIiwgCiAgICAgIGZpbGVzID0gImJhbmstYWRkaXRpb25hbC9iYW5rLWFkZGl0aW9uYWwtZnVsbC5jc3YiLCAKICAgICAganVua3BhdGhzID0gVFJVRSkKYGBgCgojIyBSZWFkIGRhdGEKClJlYWQgZGF0YSBhbmQgY2xlYW4gdXAgY29sdW1uIG5hbWVzLiBSZW1vdmUgdW5uZWNlc3NhcnkgY29sdW1ucyBhbmQgcmVjb3Jkcy4gSW1wdXRlIHllYXIgYW5kIGNyZWF0ZSBhIGNvbHVtbiBmb3IgZGF0ZS4KCmBgYHtyLCBtZXNzYWdlPUZBTFNFfQojIFJlYWQgZGF0YQojIHJhd19kYXRhIDwtIHJlYWRfZGVsaW0oZmlsZSA9ICJkYXRhL2JhbmstYWRkaXRpb25hbC1mdWxsLmNzdiIsIAojICAgICAgICAgICAgICAgICAgICAgICAgZGVsaW0gPSAiOyIsIAojICAgICAgICAgICAgICAgICAgICAgICAgY29sX3R5cGVzID0gY29scyhuci5lbXBsb3llZCA9IGNvbF9udW1iZXIoKSksCiMgICAgICAgICAgICAgICAgICAgICAgICBwcm9ncmVzcyA9IEZBTFNFKSAKIyAKIyAjIEZvcm1hdCBjb2x1bW4gbmFtZXMKIyBuYW1lcyhyYXdfZGF0YSkgPC0gc3RyX3JlcGxhY2VfYWxsKG5hbWVzKHJhd19kYXRhKSwgIlsuXSIsICJfIikKIyAKIyAjIFJlbmFtZSBhbmQgcmVtb3ZlIGNvbHVtbnMKIyBhbGxfZGF0YSA8LSByYXdfZGF0YSAlPiUKIyAgICAgcmVuYW1lKHRlcm1fZGVwb3NpdCA9IHksCiMgICAgICAgICAgICBwcmlvcl9vdXRjb21lID0gcG91dGNvbWUsCiMgICAgICAgICAgICBwZXJzb25hbF9sb2FuID0gbG9hbiwKIyAgICAgICAgICAgIGhvdXNpbmdfbG9hbiA9IGhvdXNpbmcsCiMgICAgICAgICAgICBpbl9kZWZhdWx0ID0gZGVmYXVsdCkgJT4lCiMgICBtdXRhdGUoam9iID0gc3RyX3JlcGxhY2VfYWxsKGpvYiwgIlsuLV0iLCAiIikpICU+JQojICAgIyBUaGVyZSdzIHNvbWV0aGluZyB3ZWlyZCB3aXRoIGNhbXBhaWduLi4uIHRoZXJlIGlzIHZlcnkgc3BvdHR5IGNvdmVyYWdlIGFmdGVyIDM1ICh1cCB0byA1NikuCiMgICBtdXRhdGUoY2FtcGFpZ24gPSBwbWluKGNhbXBhaWduLCAzNSkpICU+JQojICAgbXV0YXRlKHRvdGFsX2NvbnRhY3RzID0gY2FtcGFpZ24gKyBwcmV2aW91cykgJT4lCiMgICBzZWxlY3QoLWMobnJfZW1wbG95ZWQsIGVtcF92YXJfcmF0ZSwgcHJldmlvdXMsIHBkYXlzLCBjYW1wYWlnbiwgZHVyYXRpb24pKSAlPiUKIyAgIG5hLm9taXQKIyAKIyAjIEltcHV0ZSB5ZWFyIGFuZCBkYXRlCiMgYWxsX2RhdGEgPC0gYWxsX2RhdGEgJT4lCiMgICBtdXRhdGUobW9udGhfaWR4ID0gaWZlbHNlKG1vbnRoICE9IGxhZyhtb250aCwgZGVmYXVsdCA9ICJtYXkiKSwgMSwgMCkpICU+JQojICAgbXV0YXRlKG1vbnRoX2lkeCA9IGN1bXN1bShtb250aF9pZHgpKSAlPiUKIyAgIG11dGF0ZSh5ZWFyID0gZmluZEludGVydmFsKG1vbnRoX2lkeCwgYyg3LCAxNykpKSAlPiUKIyAgIG11dGF0ZSh5ZWFyID0gY2FzZV93aGVuKHllYXIgPT0gMCB+ICIyMDA4IiwKIyAgICAgICAgICAgICAgICAgICAgICAgICAgIHllYXIgPT0gMSB+ICIyMDA5IiwKIyAgICAgICAgICAgICAgICAgICAgICAgICAgIHllYXIgPT0gMiB+ICIyMDEwIikpICU+JQojICAgbXV0YXRlKGRhdGUgPSBhcy5EYXRlKHBhc3RlMCgiMSIsIG1vbnRoLCB5ZWFyKSwgIiVkJWIlWSIpKQoKYWxsX2RhdGEgPC0gcmVhZFJEUygnZGF0YS9hbGxfZGF0YS5SRFMnKQpgYGAKCiMjIEluc2lnaHRzCgpUaGVyZSB3ZXJlIGZhciBtb3JlIGNvbnRhY3RzIGluIDIwMDggYW5kIDIwMDkuCgpgYGB7cn0KCiMgTW9udGhseSByb2xsdXAKZGF0IDwtIGFsbF9kYXRhICU+JQogIGdyb3VwX2J5KG1vbnRoX2lkeCwgeWVhciwgbW9udGgsIGRhdGUpICU+JQogIHN1bW1hcml6ZShzdWJzY3JpYmUgPSBzdW0odGVybV9kZXBvc2l0ID09ICJ5ZXMiKSwgCiAgICAgICAgICAgIHRvdGFsID0gbigpLAogICAgICAgICAgICBwZXJjZW50ID0gMTAwICogbWVhbih0ZXJtX2RlcG9zaXQgPT0gInllcyIpKQoKIyBQZXJjZW50Cnh0cyhkYXQkcGVyY2VudCwgZGF0JGRhdGUpICU+JQogIGR5Z3JhcGgobWFpbiA9ICJQZXJjZW50IHN1YnNjcmliaW5nIikgJT4lCiAgZHlSYW5nZVNlbGVjdG9yKCkKCiMgVG90YWxzCndpdGgoZGF0LCB4dHMoY2JpbmQoVG90YWwgPSB0b3RhbCwgU3Vic2NyaWJlID0gc3Vic2NyaWJlKSwgZGF0ZSkpICU+JQogIGR5Z3JhcGgobWFpbiA9ICJUb3RhbCBjb250YWN0cyBhbmQgc3Vic2NyaXB0aW9ucyIpICU+JQogIGR5UmFuZ2VTZWxlY3RvcigpCmBgYAoKVGVybSBkZXBvc2l0cyBhcmUgY29ycmVsYXRlZCB0byBzb21lIGV4dGVudCB3aXRoIG1hY3JvIGVjb25vbWljIGluZGljYXRvcnMuCgpgYGB7cn0KcCA8LSBhbGxfZGF0YSAlPiUKICBncm91cF9ieShtb250aF9pZHgsIHllYXIsIG1vbnRoKSAlPiUKICBzdW1tYXJpemUoZXVyaWJvcjNtID0gbWVhbihldXJpYm9yM20pLAogICAgICAgICAgICBjb25zX2NvbmZfaWR4ID0gbWVhbihjb25zX2NvbmZfaWR4KSwKICAgICAgICAgICAgY29uc19wcmljZV9pZHggPSBtZWFuKGNvbnNfcHJpY2VfaWR4KSwKICAgICAgICAgICAgdGVybV9kZXBvc2l0ID0gMTAwICogbWVhbih0ZXJtX2RlcG9zaXQgPT0gInllcyIpKSAlPiUKICB1bmdyb3VwICU+JQogIG11dGF0ZV9hdCh2YXJzKGV1cmlib3IzbSwgY29uc19jb25mX2lkeCwgY29uc19wcmljZV9pZHgpLAogICAgICAgICAgICAgICAgIGZ1bmN0aW9uKHgpICh4IC0gbWluKHgpKSAvIChtYXgoeCkgLSBtaW4oeCkpKSAlPiUKICBnYXRoZXIoImtleSIsICJ2YWx1ZSIsIGV1cmlib3IzbSwgY29uc19jb25mX2lkeCwgY29uc19wcmljZV9pZHgpCgpnZ3Bsb3QocCwgYWVzKG1vbnRoX2lkeCwgdmFsdWUsIGNvbG9yID0ga2V5KSkgKwogIGdlb21fbGluZSgpICsKICBnZ3RpdGxlKCJFY29ub21pYyBJbmRpY2F0b3JzIikgKwogIHhsYWIoIk1vbnRoIikgKwogIHlsYWIoIkluZGV4IikKCmdncGxvdChwLCBhZXModGVybV9kZXBvc2l0LCB2YWx1ZSwgY29sb3IgPSB5ZWFyKSkgKwogIGdlb21fcG9pbnQoKSArCiAgZmFjZXRfZ3JpZCh+a2V5KSArCiAgZ2d0aXRsZSgiRWNvbm9taWMgSW5kaWNhdG9ycyB2cyBUZXJtIERlcG9zaXRzIikgKwogIHhsYWIoIlRlcm0gRGVwb2lzdCBQZXJjZW50IikgKwogIHlsYWIoIkluZGV4IikKCmBgYAoKTG93ZXIgcmF0ZXMgb2YgYGV1cmlib3IzbWAgd2VyZSBhc3NvY2lhdGVkIHdpdGggYXNzb2NpYXRlZCB3aXRoIG1vcmUgdGVybSBkZXNwb3NpdHMuIAoKYGBge3J9CmdncGxvdChhbGxfZGF0YSwgYWVzKGV1cmlib3IzbSwgY29sb3IgPSB0ZXJtX2RlcG9zaXQpKSArCiAgZ2VvbV9kZW5zaXR5KGFkanVzdD0wLjA1KQpgYGAKCkNsaWVudHMgd2l0aCBhIHN1Y2Nlc3NmdWwgcHJpb3Igb3V0Y29tZSB3ZXJlIG11Y2ggbW9yZSBsaWtsZXkgdG8gc3Vic2NyaWJlIHRvIGEgdGVybSBkZXBvc2l0LgoKYGBge3J9CmFsbF9kYXRhICU+JQogIGdyb3VwX2J5KHRlcm1fZGVwb3NpdCkgJT4lCiAgY291bnQocHJpb3Jfb3V0Y29tZSkgJT4lCiAgbXV0YXRlKHBjdCA9IHJvdW5kKDEwMCAqIG4gLyBzdW0obiksIDEpKSAlPiUKICBzZWxlY3QoLW4pICU+JQogIHNwcmVhZCh0ZXJtX2RlcG9zaXQsIHBjdCkKYGBgCgpDb250YWN0cyB3aG8gc3Vic2NyaWJlIHRvIHRlcm0gZGVwb3NpdHMgc2tldyBzbGlnaHRseSBvbGRlciBhbmQgeW91bmdlci4KCmBgYHtyfQpxcGxvdCh0ZXJtX2RlcG9zaXQsIGFnZSwgZGF0YSA9IGFsbF9kYXRhLCBnZW9tID0gImJveHBsb3QiKQpgYGAKCk9jdG9iZXIgc2F3IGFuIHVudXN1YWxseSBoaWdoIHBlcmNlbnQgb2YgdGVybSBkZXBvc2l0cyBpbiAyMDA4LgoKYGBge3J9CnAgPC0gYWxsX2RhdGEgJT4lCiAgZ3JvdXBfYnkoeWVhciwgbW9udGgpICU+JQogIHN1bW1hcml6ZSh0ZXJtX2RlcG9zaXQgPSAxMDAgKiBtZWFuKHRlcm1fZGVwb3NpdCA9PSAieWVzIikpICU+JQogIGdncGxvdChhZXMobW9udGgsIHRlcm1fZGVwb3NpdCwgZmlsbCA9IHllYXIpKSArCiAgZ2VvbV9iYXIoc3RhdCA9ICJpZGVudGl0eSIsIHBvc2l0aW9uPSJkb2RnZSIpCgpnZ3Bsb3RseShwKQpgYGAKCkJsdWUgY29sbGFyIHdvcmtlcnMgYXJlIGxlc3MgbGlrZWx5IHRvIHN1YnNjcmliZS4KCmBgYHtyfQphbGxfZGF0YSAlPiUKICBncm91cF9ieSh0ZXJtX2RlcG9zaXQpICU+JQogIGNvdW50KGpvYikgJT4lCiAgbXV0YXRlKHBjdCA9IHJvdW5kKDEwMCAqIG4gLyBzdW0obiksIDEpKSAlPiUKICBzZWxlY3QoLW4pICU+JQogIHNwcmVhZCh0ZXJtX2RlcG9zaXQsIHBjdCkKYGBgCgpUaGVyZSB3ZXJlIGZhciBtb3JlIGNvbnRhY3RzIGluIDIwMDggdGhhbiBpbiAyMDEwCgpgYGB7cn0KYWxsX2RhdGEgJT4lCiAgZ3JvdXBfYnkoeWVhciwgbW9udGgpICU+JQogIHN1bW1hcml6ZSh0ZXJtX2RlcG9zaXQgPSBzdW0odGVybV9kZXBvc2l0ID09ICJ5ZXMiKSwgY291bnQgPSBuKCkpICU+JQogIGdncGxvdChhZXMoY291bnQsIHRlcm1fZGVwb3NpdCwgY29sb3IgPSB5ZWFyKSkgKwogIGdlb21fcG9pbnQoKSArCiAgeGxhYigiVG90YWwgY29udGFjdHMiKSArCiAgeWxhYigiVGVybSBEZXBvc2l0IFN1YnNjcmlwdGlvbnMiKSArCiAgZ2d0aXRsZSgiQ29udGFjdCB2b2x1bWUiKQpgYGAKCiMjIENvbmNsdXNpb25zCgpUaGVyZSBpcyBhIGxvdCBvZiB2YXJpYWJpbGl0eSBpbiB0b3RhbCBjb250YWN0cyBhbmQgaW4gcGVyY2VudGFnZSBvZiB0ZXJtIGRlcG9zaXQgc3Vic2NyaXB0aW9ucyBvdmVyIHRpbWUuIFRoZXJlIHdlcmUgZmFyIG1vcmUgY29udGFjdHMgbWFkZSBpbiAyMDA4IHdoZW4gdGhlIHRocmVlIG1vbnRoIEV1cmlib3IgcmF0ZSB3YXMgaGlnaC4gQXMgdG90YWwgY29udGFjdHMgZHJvcHBlZCwgdGhlIHBlcmNlbnRhZ2Ugb2YgdGVybSBkZXBvc2l0IHN1YnNyaXB0aW9ucyBpbmNyZWFzZWQuCgpTdWJzY3JpcHRpb25zIGFyZSBjb3JyZWxhdGVkIHdpdGggYSB2YXJpZXR5IG9mIGZhY3RvcnMsIGluY2x1ZGluZyBhZ2UgYW5kIGpvYiB0byBuYW1lIGEgZmV3LiBUaGVyZSBpcyBhbHNvIGEgc3Ryb25nIGNvcnJlbGF0aW9uIHdpdGggcHJpb3Igb3V0Y29tZXMsIGFzIG9uZSBtaWdodCBleHBlY3QuCgpUaGlzIGFuYWx5c2lzIG9ubHkgYW5hbHl6ZWQgYSBmZXcgc2VsZWN0IHZhcmlhYmxlcy4gVGhlIG5leHQgZ29hbCBzaG91bGQgYmUgdG8gcHJlZGljdCB0ZXJtIGRlcG9zaXQgc3Vic2NyaXB0aW9ucyBhcyBhIGZ1bmN0aW9uIG9mIGFsbCB0aGUgdmFyaWFibGVzIGluIHRoZSBkYXRhLg==